The aim of Named Entity Recognition (NER) is to identify references of namedentities in unstructured documents, and to classify them into pre-definedsemantic categories. NER often aids from added background knowledge in the formof gazetteers. However using such a collection does not deal with name variantsand cannot resolve ambiguities associated in identifying the entities incontext and associating them with predefined categories. We present asemi-supervised NER approach that starts with identifying named entities with asmall set of training data. Using the identified named entities, the word andthe context features are used to define the pattern. This pattern of each namedentity category is used as a seed pattern to identify the named entities in thetest set. Pattern scoring and tuple value score enables the generation of thenew patterns to identify the named entity categories. We have evaluated theproposed system for English language with the dataset of tagged (IEER) anduntagged (CoNLL 2003) named entity corpus and for Tamil language with thedocuments from the FIRE corpus and yield an average f-measure of 75% for boththe languages.
展开▼